NLI Shared Task 2013: MQ Submission
نویسندگان
چکیده
Our submission for this NLI shared task used for the most part standard features found in recent work. Our focus was instead on two other aspects of our system: at a high level, on possible ways of constructing ensembles of multiple classifiers; and at a low level, on the granularity of part-of-speech tags used as features. We found that the choice of ensemble combination method did not lead to much difference in results, although exploiting the varying behaviours of linear versus logistic regression SVM classifiers could be promising in future work; but part-of-speech tagsets showed noticeable differences. We also note that the overall architecture, with its feature set and ensemble approach, had an accuracy of 83.1% on the test set when trained on both the training data and development data supplied, close to the best result of the task. This suggests that basically throwing together all the features of previous work will achieve roughly the state of the art.
منابع مشابه
NAIST at the NLI 2013 Shared Task
This paper describes the Nara Institute of Science and Technology (NAIST) native language identification (NLI) system in the NLI 2013 Shared Task. We apply feature selection using a measure based on frequency for the closed track and try Capping and Sampling data methods for the open tracks. Our system ranked ninth in the closed track, third in open track 1 and fourth in open track 2.
متن کاملUsing Other Learner Corpora in the 2013 NLI Shared Task
Our efforts in the 2013 NLI shared task focused on the potential benefits of external corpora. We show that including training data from multiple corpora is highly effective at robust, cross-corpus NLI (i.e. open-training task 1), particularly when some form of domain adaptation is also applied. This method can also be used to boost performance even when training data from the same corpus is av...
متن کاملFeature Engineering in the NLI Shared Task 2013: Charles University Submission Report
Our goal is to predict the first language (L1) of English essays’s authors with the help of the TOEFL11 corpus where L1, prompts (topics) and proficiency levels are provided. Thus we approach this task as a classification task employing machine learning methods. Out of key concepts of machine learning, we focus on feature engineering. We design features across all the L1 languages not making us...
متن کاملVTEX System Description for the NLI 2013 Shared Task
This paper describes the system developed for the NLI 2013 Shared Task, requiring to identify a writer’s native language by some text written in English. I explore the given manually annotated data using word features such as the length, endings and character trigrams. Furthermore, I employ k-NN classification. Modified TFIDF is used to generate a stop-word list automatically. The distance betw...
متن کاملA Report on the 2017 Native Language Identification Shared Task
Native Language Identification (NLI) is the task of automatically identifying the native language (L1) of an individual based on their language production in a learned language. It is typically framed as a classification task where the set of L1s is known a priori. Two previous shared tasks on NLI have been organized where the aim was to identify the L1 of learners of English based on essays (2...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013